Does Multimodality Help Human and Machine for Translation and Image Captioning?
نویسندگان
چکیده
This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate the usefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR.
منابع مشابه
Changes on the Horizon for the Multimedia Community
The Impact of Deep Learning The development of AI algorithms, represented by deep learning, has bolstered multimedia research. In particular, deep learning has led to a multimodality-based algorithm framework, enabling the effective fusion and use of cross-domain data. Take image and video captioning, for example. A couple of years ago, tagging was the only way to describe images and videos. Bu...
متن کاملMultimodal Attention for Neural Machine Translation
The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simult...
متن کاملUnraveling the Contribution of Image Captioning and Neural Machine Translation for Multimodal Machine Translation
Recent work on multimodal machine translation has attempted to address the problem of producing target language image descriptions based on both the source language description and the corresponding image. However, existingwork has not been conclusive on the contribution of visual information. This paper presents an in-depth study of the problem by examining the differences and complementaritie...
متن کاملImage2Text: A Multimodal Caption Generator
In this work, we showcase the Image2Text system, which is a real-time captioning system that can generate human-level natural language description for any input image. We formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from...
متن کاملImage Titles - Variations on Show, Attend and Tell
Inspired by recent advances in machine translation and object detection, we implement an image captioning pipeline, consisting of a Fully Convolutional Neural Network piping image features into an image-captioning LSTM, based on the popular Show, Attend, and Tell model. We implement the model in TensorFlow and recreate performance metrics reported in the paper. We identify and experiment with v...
متن کامل